Machine learning final project - Stock prediction and analysis - team 14

Topic

股票買賣
判斷應該是要買多還是賣空、預判漲跌,然後使利潤最大化

motivations

  • 動機:個人利益方面,當然是希望所學能幫自己賺錢。社會方面,因為AI在有些方面已經超越人類了,像是圍棋,而運用在財金方面是否能有顯著的提供協助?財金相較於其他領域是更貼近人們的生活的,因此可能對人類的生活產生巨變,而且投資要考量的變數實在很多,所以我很好奇運用AI技術的投資能否打敗人類。
  • 重要性:AI已經有發展到一定的程度了,然而運用AI的領域實在不算多,目前可能只有遊戲、視覺、跟自然語言處理有比較傑出的成果,AI勢必還需要在某些能大大改變人們生活的領域發展,更能顯的AI會如何影響人類。如果AI加上財金將會大大地改變人們的生活,而且投資並不是只有考量數學,還是一門跟人文、心理相關的學問,而AI是否能連這些都考慮到是我所好奇的。

Formulate the problem.

  • 起始金額為現金10,000元,希望最後能賺最多錢
  • 每天都要判斷買多還是賣多這固定的十檔股票(0050、0056、鴻海、台積電、聯發科、大立光、富邦金、國泰金、玉山金、元大金)
  • 自變數會用先前的開盤價、最高價、最低價、收盤價、成交量、5日均價、20日均價、網路聲量(如果有學會爬蟲的話)等等,而變數就是這十檔分別該買還是賣
  • 假設想買一定買得到,想脫手一定賣得掉
  • 假設股票能以前一天的收盤價買進,融券也是以前一天的收盤價為代價
  • 假設手續費為交易金額的0.1%
  • 先預測哪些會漲那些會跌之後,在認為漲超過手續費的股票中,覺得投資報酬率最高的投入剩餘現有現金的50%,第二高的再投入剩餘現有現金的50%,以此類推,並將剩餘所有的現有現金投入投資報酬率最低但還是認為能漲超過手續費的,然後隔天要將所有持有的股票都賣掉。認為會跌超過手續費的也是按照買股的方式,在認為跌超過手續費的股票中,覺得賣空的投資報酬率最高的就借價值約為現有現金的50%的券,賣空的投資報酬率第二高的再借價值約為剩餘現有現金的25%的券,以此類推,並在最後借與前一個同價值但還是認為能跌超過手續費的券,然後隔天也要將所有持有的股票都賣掉。隔天都會平倉。
  • 可以全部都不買也不賣

The data set information and attribution information (6 years)

  • stock:
    • 0050 Yuanta Taiwan Top5: 0050 元大台灣50
    • 0056 PTD: 0056 元大高股息
    • 2317 Hon Hai Precision: 2317 鴻海
    • 2330 TSMC: 2330 台積電
    • 2454 MediaTek: 2454 聯發科
    • 3008 Largan: 3008 大立光
    • 2881 Fubon FHC: 2881 富邦金
    • 2882 Cathay Holdings: 2882 國泰金
    • 2884 E.S.F.H: 2884 玉山金
    • 2885 Yuanta Group: 2885 元大金
  • attribute:
    • CO_ID: 公司代碼
    • Date: 年月日
    • Open(NTD): 開盤價(元)
    • High(NTD): 最高價(元)
    • Low(NTD): 最低價(元)
    • Close(NTD): 收盤價(元)
    • Volume(1000S): 成交量(千股)
    • Amount(NTD1000): 成交值(千元)
    • AVG CLOSE: 當日均價(元)
    • AVG CLOSE 5D: 5日均價(元)
    • AVG CLOSE 10D: 10日均價(元)
    • AVG CLOSE 20D: 20日均價(元)
    • AVG Vol 5D: 5日均量
    • AVG Vol 10D: 10日均量
    • AVG Vol 20D: 20日均量
    • ROI%: 報酬率%
    • Shares(1000S): 流通在外股數(千股)
    • Market Cap.(NTD MN): 市值(百萬元)
    • P/E-TEJ: 本益比-TEJ
    • P/B-TEJ: 股價淨值比-TEJ
    • Dividend_Yield%: 股利殖利率
    • Cash_Dividend%: 現金股利率
    • Price_Change(NTD): 股價漲跌(元)
    • High minus Low %: 高低價差%
    • Market: 上市別
    • Capital: 資本
    • No.of Employee: 員工人數

import every module I need

In [1]:
import pandas as pd 
import numpy as np  
import seaborn as sns
import matplotlib.pyplot as plt
import math   
np.seterr(divide='ignore', invalid='ignore')
Out[1]:
{'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'}

Gathering Data

online open dataset and some data parsers

  • data source: TEJ, yahoo finance, apple stock

read data

In [2]:
stock_data = pd.read_csv('stock(eng).csv', encoding='utf-8', thousands=',')
print("dataset_shape = ", stock_data.shape)
dataset_shape =  (14020, 27)

Questionnaire surveys

各檔股票在50人中有聽過的次數

In [3]:
# 元大台灣50, 元大高股息, 鴻海, 台積電, 聯發科, 大立光, 富邦金, 國泰金, 玉山金, 元大金
reputation = [37, 15, 43, 49, 44, 20, 25, 26, 22, 24]

Define a Problem

  • 起始金額為現金10,000元,希望最後能賺最多錢
  • 每天都要判斷買多還是賣多這固定的十檔股票(0050、0056、鴻海、台積電、聯發科、大立光、富邦金、國泰金、玉山金、元大金)
  • 自變數會用先前的開盤價、最高價、最低價、收盤價、成交量、5日均價、20日均價、網路聲量(如果有學會爬蟲的話)等等,而變數就是這十檔分別該買還是賣
  • 假設想買一定買得到,想脫手一定賣得掉
  • 假設股票能以前一天的收盤價買進,融券也是以前一天的收盤價為代價
  • 假設手續費為交易金額的0.1%
  • 先預測哪些會漲那些會跌之後,在認為漲超過手續費的股票中,覺得投資報酬率最高的投入剩餘現有現金的50%,第二高的再投入剩餘現有現金的50%,以此類推,並將剩餘所有的現有現金投入投資報酬率最低但還是認為能漲超過手續費的,然後隔天要將所有持有的股票都賣掉。認為會跌超過手續費的也是按照買股的方式,在認為跌超過手續費的股票中,覺得賣空的投資報酬率最高的就借價值約為現有現金的50%的券,賣空的投資報酬率第二高的再借價值約為剩餘現有現金的25%的券,以此類推,並在最後借與前一個同價值但還是認為能跌超過手續費的券,然後隔天也要將所有持有的股票都賣掉。隔天都會平倉。
  • 可以全部都不買也不賣
  • 預測哪些股票漲,哪些股票跌
  • 預測正值前三高return的stock,之後買多。
  • 預測負值前三高return的stock,之後賣空。

Data Processing has 3 parts

Data Processing - Part 1

encodings

Label Encoding

In [4]:
reputation = np.argsort(reputation) + 1
In [5]:
# 1402: 6 years
reputation = np.repeat(reputation, 1402)
In [6]:
stock_data['Reputation'] = reputation 
In [7]:
stock_data.head()
Out[7]:
CO_ID Date Open(NTD) High(NTD) Low(NTD) Close(NTD) Volume(1000S) Amount(NTD1000) AVG CLOSE AVG CLOSE 5D ... P/E-TEJ P/B-TEJ Dividend_Yield% Cash_Dividend% Price_Change(NTD) High minus Low % Market Capital No.of Employee Reputation
0 0050 Yuanta Taiwan Top50 2015/1/5 66.40 66.75 66.00 66.55 6295 417637 66.3379 66.81 ... NaN NaN NaN 2.33 -0.30 1.1219 TSE NaN NaN 2
1 0050 Yuanta Taiwan Top50 2015/1/6 65.75 65.75 64.75 64.90 19501 1272547 65.2527 66.47 ... NaN NaN NaN 2.39 -1.65 1.5026 TSE NaN NaN 2
2 0050 Yuanta Taiwan Top50 2015/1/7 64.70 65.25 64.70 65.00 6991 454539 65.0127 66.03 ... NaN NaN NaN 2.38 0.10 0.8475 TSE NaN NaN 2
3 0050 Yuanta Taiwan Top50 2015/1/8 65.50 66.60 65.50 66.50 13153 871151 66.2295 65.96 ... NaN NaN NaN 2.33 1.50 1.6923 TSE NaN NaN 2
4 0050 Yuanta Taiwan Top50 2015/1/9 66.90 66.95 66.05 66.15 5891 391342 66.4195 65.82 ... NaN NaN NaN 2.34 -0.35 1.3534 TSE NaN NaN 2

5 rows × 28 columns

data clean-ups

In [8]:
print("is_any_null ", stock_data.isnull().values.any())
is_any_null  True

data preprocess for dropping column with lots of NaN and replacing NaN with mean

In [9]:
stock_data = stock_data.dropna(thresh=len(stock_data.index) * 0.8, axis=1)
stock_data = stock_data.fillna(stock_data.mean())

standardlization

observe the data

In [10]:
stock_data.min()
Out[10]:
CO_ID                  0050 Yuanta Taiwan Top50
Date                                  2015/1/12
Open(NTD)                                    10
High(NTD)                                 10.15
Low(NTD)                                   9.97
Close(NTD)                                 9.97
Volume(1000S)                                 0
Amount(NTD1000)                               0
AVG CLOSE                               10.0197
AVG CLOSE 5D                                  0
AVG CLOSE 10D                             10.13
AVG CLOSE 20D                             10.32
AVG Vol 5D                                    0
AVG Vol 10D                             251.982
AVG Vol 20D                             271.137
ROI%                                        -10
Shares(1000S)                            134140
Market Cap.(NTD MN)                        4320
P/E-TEJ                                    5.53
P/B-TEJ                                    0.59
Dividend_Yield%                            0.89
Cash_Dividend%                                0
Price_Change(NTD)                          -465
High minus Low %                              0
Market                                      TSE
Capital                              1.3414e+09
No.of Employee                             7719
Reputation                                    1
dtype: object
In [11]:
stock_data.mean()
Out[11]:
Open(NTD)              4.718116e+02
High(NTD)              4.777744e+02
Low(NTD)               4.652290e+02
Close(NTD)             4.708155e+02
Volume(1000S)          1.804997e+04
Amount(NTD1000)        2.027967e+06
AVG CLOSE              4.716163e+02
AVG CLOSE 5D           4.705980e+02
AVG CLOSE 10D          4.703167e+02
AVG CLOSE 20D          4.696966e+02
AVG Vol 5D             1.805096e+04
AVG Vol 10D            1.803169e+04
AVG Vol 20D            1.801927e+04
ROI%                   4.636046e-02
Shares(1000S)          8.897965e+06
Market Cap.(NTD MN)    9.624397e+05
P/E-TEJ                1.504496e+01
P/B-TEJ                2.296988e+00
Dividend_Yield%        4.227955e+00
Cash_Dividend%         3.663234e+00
Price_Change(NTD)      1.396747e-01
High minus Low %       1.391597e+00
Capital                1.155307e+11
No.of Employee         1.197838e+05
Reputation             5.500000e+00
dtype: float64

normalization 限縮data的數量級,但因為是time series不能用 z-score standardlization

In [12]:
from sklearn.preprocessing import StandardScaler
In [13]:
min_too_big = stock_data._get_numeric_data().min() > 1000
min_too_big_attribute = set(min_too_big[min_too_big == True].index)
stock_data[list(min_too_big_attribute)] /= 1000
mean_too_large = stock_data.mean() > 1000000
mean_too_large_attribute = set(mean_too_large[mean_too_large == True].index)
stock_data[list(mean_too_large_attribute)] /= 1000000
mean_too_big = stock_data.mean() > 1000
mean_too_big_attribute = set(mean_too_big[mean_too_big == True].index)
stock_data[list(mean_too_big_attribute)] /= 1000
In [14]:
stock_data.mean()
Out[14]:
Open(NTD)              471.811568
High(NTD)              477.774449
Low(NTD)               465.228961
Close(NTD)             470.815529
Volume(1000S)           18.049971
Amount(NTD1000)          2.027967
AVG CLOSE              471.616269
AVG CLOSE 5D           470.598046
AVG CLOSE 10D          470.316699
AVG CLOSE 20D          469.696649
AVG Vol 5D              18.050959
AVG Vol 10D             18.031691
AVG Vol 20D             18.019274
ROI%                     0.046360
Shares(1000S)            8.897965
Market Cap.(NTD MN)    962.439744
P/E-TEJ                 15.044964
P/B-TEJ                  2.296988
Dividend_Yield%          4.227955
Cash_Dividend%           3.663234
Price_Change(NTD)        0.139675
High minus Low %         1.391597
Capital                115.530695
No.of Employee         119.783750
Reputation               5.500000
dtype: float64
In [15]:
stock_data.head()
Out[15]:
CO_ID Date Open(NTD) High(NTD) Low(NTD) Close(NTD) Volume(1000S) Amount(NTD1000) AVG CLOSE AVG CLOSE 5D ... P/E-TEJ P/B-TEJ Dividend_Yield% Cash_Dividend% Price_Change(NTD) High minus Low % Market Capital No.of Employee Reputation
0 0050 Yuanta Taiwan Top50 2015/1/5 66.40 66.75 66.00 66.55 6.295 0.417637 66.3379 66.81 ... 15.044964 2.296988 4.227955 2.33 -0.30 1.1219 TSE 115.530695 119.78375 2
1 0050 Yuanta Taiwan Top50 2015/1/6 65.75 65.75 64.75 64.90 19.501 1.272547 65.2527 66.47 ... 15.044964 2.296988 4.227955 2.39 -1.65 1.5026 TSE 115.530695 119.78375 2
2 0050 Yuanta Taiwan Top50 2015/1/7 64.70 65.25 64.70 65.00 6.991 0.454539 65.0127 66.03 ... 15.044964 2.296988 4.227955 2.38 0.10 0.8475 TSE 115.530695 119.78375 2
3 0050 Yuanta Taiwan Top50 2015/1/8 65.50 66.60 65.50 66.50 13.153 0.871151 66.2295 65.96 ... 15.044964 2.296988 4.227955 2.33 1.50 1.6923 TSE 115.530695 119.78375 2
4 0050 Yuanta Taiwan Top50 2015/1/9 66.90 66.95 66.05 66.15 5.891 0.391342 66.4195 65.82 ... 15.044964 2.296988 4.227955 2.34 -0.35 1.3534 TSE 115.530695 119.78375 2

5 rows × 28 columns

In [16]:
stock_data.tail()
Out[16]:
CO_ID Date Open(NTD) High(NTD) Low(NTD) Close(NTD) Volume(1000S) Amount(NTD1000) AVG CLOSE AVG CLOSE 5D ... P/E-TEJ P/B-TEJ Dividend_Yield% Cash_Dividend% Price_Change(NTD) High minus Low % Market Capital No.of Employee Reputation
14015 2885 Yuanta Group 2020/9/21 18.20 18.25 18.00 18.00 26.320 0.476759 18.1137 18.24 ... 11.4330 0.9541 5.83 3.4722 -0.30 1.3661 TSE 121.374359 14.223 4
14016 2885 Yuanta Group 2020/9/22 17.95 18.00 17.70 17.75 30.579 0.544932 17.8202 18.14 ... 11.2742 0.9408 5.92 3.5211 -0.25 1.6667 TSE 121.374359 14.223 4
14017 2885 Yuanta Group 2020/9/23 17.75 17.85 17.65 17.70 19.604 0.347381 17.7191 18.01 ... 11.2425 0.9382 5.93 3.5311 -0.05 1.1268 TSE 121.374359 14.223 4
14018 2885 Yuanta Group 2020/9/24 17.55 17.55 17.15 17.15 44.211 0.763162 17.2618 17.78 ... 10.8931 0.9090 6.12 3.6443 -0.55 2.2599 TSE 121.374359 14.223 4
14019 2885 Yuanta Group 2020/9/25 17.25 17.60 17.20 17.30 26.052 0.451970 17.3485 17.58 ... 10.9884 0.9170 6.07 3.6127 0.15 2.3324 TSE 121.374359 14.223 4

5 rows × 28 columns

Statistical Computing and Data Visualization

Statistical Computing

In [17]:
stock_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14020 entries, 0 to 14019
Data columns (total 28 columns):
CO_ID                  14020 non-null object
Date                   14020 non-null object
Open(NTD)              14020 non-null float64
High(NTD)              14020 non-null float64
Low(NTD)               14020 non-null float64
Close(NTD)             14020 non-null float64
Volume(1000S)          14020 non-null float64
Amount(NTD1000)        14020 non-null float64
AVG CLOSE              14020 non-null float64
AVG CLOSE 5D           14020 non-null float64
AVG CLOSE 10D          14020 non-null float64
AVG CLOSE 20D          14020 non-null float64
AVG Vol 5D             14020 non-null float64
AVG Vol 10D            14020 non-null float64
AVG Vol 20D            14020 non-null float64
ROI%                   14020 non-null float64
Shares(1000S)          14020 non-null float64
Market Cap.(NTD MN)    14020 non-null float64
P/E-TEJ                14020 non-null float64
P/B-TEJ                14020 non-null float64
Dividend_Yield%        14020 non-null float64
Cash_Dividend%         14020 non-null float64
Price_Change(NTD)      14020 non-null float64
High minus Low %       14020 non-null float64
Market                 14020 non-null object
Capital                14020 non-null float64
No.of Employee         14020 non-null float64
Reputation             14020 non-null int64
dtypes: float64(24), int64(1), object(3)
memory usage: 3.0+ MB
In [18]:
stock_data.describe()
Out[18]:
Open(NTD) High(NTD) Low(NTD) Close(NTD) Volume(1000S) Amount(NTD1000) AVG CLOSE AVG CLOSE 5D AVG CLOSE 10D AVG CLOSE 20D ... Market Cap.(NTD MN) P/E-TEJ P/B-TEJ Dividend_Yield% Cash_Dividend% Price_Change(NTD) High minus Low % Capital No.of Employee Reputation
count 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 ... 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000 14020.000000
mean 471.811568 477.774449 465.228961 470.815529 18.049971 2.027967 471.616269 470.598046 470.316699 469.696649 ... 962.439744 15.044964 2.296988 4.227955 3.663234 0.139675 1.391597 115.530695 119.783750 5.500000
std 1170.949182 1186.110606 1153.859675 1167.931604 18.597335 3.192999 1169.624776 1167.300873 1166.562404 1165.028368 ... 1725.599661 6.357659 1.845026 1.513258 1.391331 32.966458 1.234087 67.307252 216.184762 2.872384
min 10.000000 10.150000 9.970000 9.970000 0.000000 0.000000 10.019700 0.000000 10.130000 10.320000 ... 4.320000 5.530000 0.590000 0.890000 0.000000 -465.000000 0.000000 1.341402 7.719000 1.000000
25% 26.207500 26.300000 26.130000 26.220000 5.366000 0.352862 26.228375 26.217500 26.180000 26.140000 ... 154.262000 10.760000 1.110000 3.320000 2.654900 -0.350000 0.659900 115.002640 14.223000 3.000000
50% 60.800000 61.175000 60.250000 60.750000 13.570000 0.849133 60.710950 60.585000 60.730000 61.065000 ... 443.889000 14.820000 1.660000 4.227955 3.590000 0.000000 1.183400 118.452527 47.881000 5.500000
75% 213.500000 215.500000 212.000000 214.000000 24.859750 2.521701 214.133400 213.925000 214.062500 214.585000 ... 649.517000 16.240000 2.296988 5.020000 4.540000 0.400000 1.800150 138.629906 119.783750 8.000000
max 6000.000000 6075.000000 5955.000000 6000.000000 332.176000 77.804668 6006.630000 5918.000000 5879.500000 5835.000000 ... 11876.114000 47.550000 12.200000 11.000000 11.000000 390.000000 12.797600 259.303805 757.404000 10.000000

8 rows × 25 columns

In [19]:
print(stock_data.nunique())
CO_ID                     10
Date                    1402
Open(NTD)               3224
High(NTD)               3246
Low(NTD)                3280
Close(NTD)              3289
Volume(1000S)          11507
Amount(NTD1000)        13970
AVG CLOSE              13928
AVG CLOSE 5D            7686
AVG CLOSE 10D           9088
AVG CLOSE 20D           9956
AVG Vol 5D             13999
AVG Vol 10D            13997
AVG Vol 20D            14002
ROI%                    8062
Shares(1000S)           1198
Market Cap.(NTD MN)     6911
P/E-TEJ                 3977
P/B-TEJ                 2794
Dividend_Yield%          787
Cash_Dividend%          3323
Price_Change(NTD)        476
High minus Low %        6099
Market                     1
Capital                    9
No.of Employee             9
Reputation                10
dtype: int64
In [20]:
stock_data_cols = stock_data.columns
stock_data_cols_set = set(stock_data_cols)
stock_data_cols_list = list(stock_data_cols_set)
stock_data_continuous_variables_cols = stock_data._get_numeric_data().columns
stock_data_continuous_variables_cols_set = set(stock_data_continuous_variables_cols)
stock_data_continuous_variables_cols_list = list(stock_data_continuous_variables_cols_set)
print("stock_data_continuous_variables: ", stock_data_continuous_variables_cols_list)
stock_data_categorical_variables_cols_set = stock_data_cols_set - stock_data_continuous_variables_cols_set
stock_data_categorical_variables_cols_list = list(stock_data_categorical_variables_cols_set)
print("stock_data_categorical_variables:", stock_data_categorical_variables_cols_list)
stock_data_continuous_variables:  ['P/B-TEJ', 'Amount(NTD1000)', 'AVG CLOSE 5D', 'AVG CLOSE 10D', 'AVG Vol 10D', 'P/E-TEJ', 'Close(NTD)', 'AVG CLOSE 20D', 'ROI%', 'Dividend_Yield%', 'No.of Employee', 'Market Cap.(NTD MN)', 'AVG CLOSE', 'AVG Vol 20D', 'High minus Low %', 'AVG Vol 5D', 'Reputation', 'Capital', 'Open(NTD)', 'Low(NTD)', 'Volume(1000S)', 'Shares(1000S)', 'Cash_Dividend%', 'High(NTD)', 'Price_Change(NTD)']
stock_data_categorical_variables: ['CO_ID', 'Market', 'Date']

frequency tables

In [21]:
[print(f'categorical variable: {i}\n{stock_data[i].value_counts()}\n') for i in stock_data_categorical_variables_cols_list]
categorical variable: CO_ID
2884 E.S.F.H                1402
0050 Yuanta Taiwan Top50    1402
2330 TSMC                   1402
2881 Fubon FHC              1402
2882 Cathay Holdings        1402
2317 Hon Hai Precision      1402
2454 MediaTek               1402
2885 Yuanta Group           1402
0056 PTD                    1402
3008 Largan                 1402
Name: CO_ID, dtype: int64

categorical variable: Market
TSE    14020
Name: Market, dtype: int64

categorical variable: Date
2020/3/2      10
2020/5/14     10
2019/12/2     10
2020/7/2      10
2018/10/12    10
              ..
2018/2/8      10
2017/8/3      10
2018/12/26    10
2017/1/6      10
2018/6/27     10
Name: Date, Length: 1402, dtype: int64

Out[21]:
[None, None, None]

relative frequency tables

In [22]:
[print(f'categorical variable: {i}\n{stock_data[i].value_counts(normalize=True, sort=True)}\n') for i in stock_data_categorical_variables_cols_list]
categorical variable: CO_ID
2884 E.S.F.H                0.1
0050 Yuanta Taiwan Top50    0.1
2330 TSMC                   0.1
2881 Fubon FHC              0.1
2882 Cathay Holdings        0.1
2317 Hon Hai Precision      0.1
2454 MediaTek               0.1
2885 Yuanta Group           0.1
0056 PTD                    0.1
3008 Largan                 0.1
Name: CO_ID, dtype: float64

categorical variable: Market
TSE    1.0
Name: Market, dtype: float64

categorical variable: Date
2020/3/2      0.000713
2020/5/14     0.000713
2019/12/2     0.000713
2020/7/2      0.000713
2018/10/12    0.000713
                ...   
2018/2/8      0.000713
2017/8/3      0.000713
2018/12/26    0.000713
2017/1/6      0.000713
2018/6/27     0.000713
Name: Date, Length: 1402, dtype: float64

Out[22]:
[None, None, None]

Data Visualization

bar plots

In [23]:
# [stock_data[i].value_counts().plot.bar(title=f'Freq dist of {i}') for i in stock_data_categorical_variables_cols_list]
for stock_data_categorical_variables_col in stock_data_categorical_variables_cols_list:
    stock_data[stock_data_categorical_variables_col].value_counts().plot.bar()
    plt.title(f'Freq dist of {stock_data_categorical_variables_col}')
    plt.show()

histograms

In [24]:
for stock_data_continuous_variables_col in stock_data_continuous_variables_cols_list: 
    stock_data[stock_data_continuous_variables_col].hist(figsize= (8, 8), bins= 100)
    plt.title(stock_data_continuous_variables_col)
    plt.show()

distribution plots

In [25]:
stock_data_continuous_variables_cols_list_without_nan = []
for i in stock_data_continuous_variables_cols_list:
    if stock_data[i].isnull().values.any() == False:
        stock_data_continuous_variables_cols_list_without_nan.append(i)

fig, ax = plt.subplots(len(stock_data_continuous_variables_cols_list_without_nan), figsize=(16, 30))
for stock_data_idx, stock_data_continuous_variables_col in enumerate(stock_data_continuous_variables_cols_list_without_nan):
    sns.distplot(stock_data[stock_data_continuous_variables_col], hist=True, ax=ax[stock_data_idx])
    ax[stock_data_idx].set_title('Freq dist '+ stock_data_continuous_variables_col, fontsize=20)
    ax[stock_data_idx].set_xlabel(stock_data_continuous_variables_col, fontsize=10)
    ax[stock_data_idx].set_ylabel('Count', fontsize=10)
plt.show()

pairwise scatterplots

In [26]:
sns.pairplot(stock_data[stock_data_continuous_variables_cols_list])
plt.show()

heatmap

In [27]:
plt.figure(figsize=(10, 10))
sns.heatmap(stock_data.corr(), annot=False, center=0.0, cmap='coolwarm'); # cmap="YlGnBu",
plt.show()

Data Processing - Part 2

Datatype transformation and transform data format and shape so your model can process them.

create rolling window

observe a month to decide buy or sell

In [28]:
stock_type = set(stock_data[stock_data.columns[0]])
observe_date_long = 30
In [29]:
col_name = []
stock_data_list = []
Y = []
for s in stock_type:
    col_name.append(s)
    df = stock_data[stock_data[stock_data.columns[0]] == s]
    df = df.dropna(axis='columns')
    df = df.drop(columns=stock_data.columns[0])
    df = df.drop(columns=stock_data.columns[1])
    df = df.drop(columns='Market')
    rep = np.unique(df['Reputation'])[0]
    df = df.drop(columns='Reputation')
    df = df.reset_index(drop=True)
    Y.append(df['ROI%'].iloc[observe_date_long:])
    df = df.drop(columns='ROI%')
    df = df.reset_index(drop=True)
    df_1 = df.copy()
    for i in range(observe_date_long - 1):
        df_new = df_1.drop(df_1.index[0], inplace=False)
        df_new = df_new.append(df_1.iloc[-1])
        df_1 = df_1.reset_index(drop=True)
        df_new = df_new.reset_index(drop=True)
        df_1 = df_new.copy()
        df = pd.concat([df, df_1], axis=1)
    df['Reputation'] = rep
    df = df.drop(df.index[:observe_date_long], inplace=False)
    stock_data_list.append(df)
In [30]:
stock_data_roll_window = stock_data_list[0]
for i in range(len(stock_data_list) - 1):
    stock_data_roll_window = pd.concat([stock_data_roll_window, stock_data_list[i + 1]], axis=0)
In [31]:
X = stock_data_list.copy()

generate train data and test data

test for a month

In [32]:
train_x = []
test_x = []
train_y = []
test_y = []
n_test = 30
for i in range(len(stock_type)):
    train_x.append(X[i].iloc[:-n_test])
    test_x.append(X[i].iloc[-n_test:])
    train_y.append(Y[i][:-n_test])
    test_y.append(Y[i][-n_test:])
In [33]:
train_x_roll_window = train_x[0]
test_x_roll_window = test_x[0]
train_y_roll_window = train_y[0]
test_y_roll_window = test_y[0]
for i in range(len(train_x) - 1):
    train_x_roll_window = pd.concat([train_x_roll_window, train_x[i + 1]], axis=0)
    test_x_roll_window = pd.concat([test_x_roll_window, test_x[i + 1]], axis=0)
    train_y_roll_window = pd.concat([train_y_roll_window, train_y[i + 1]], axis=0)
    test_y_roll_window = pd.concat([test_y_roll_window, test_y[i + 1]], axis=0)
train_x = train_x_roll_window
test_x = test_x_roll_window
train_y = np.array(train_y_roll_window)
test_y = np.array(test_y_roll_window)

Data Processing - Part 3

transform data format and shape so your model can process them.

Data analysis - PCA for SVM

import module

In [34]:
from sklearn.decomposition import PCA
In [36]:
n_com = 10
explained_ratio = 0
explained_ratio_threshold = 90
for i in range(len(stock_type)):
    covar_matrix = PCA(n_components = n_com)
    # calculate variance ratios
    covar_matrix.fit(X[i])
    cumulative_sum_of_variance_explained = np.cumsum(np.round(covar_matrix.explained_variance_ratio_, decimals=4) * 100)
    explained_ratio = cumulative_sum_of_variance_explained[-1] 
    while explained_ratio < explained_ratio_threshold:
        n_com += 1
        covar_matrix = PCA(n_components = n_com)
        # calculate variance ratios
        covar_matrix.fit(X[i])
        cumulative_sum_of_variance_explained = np.cumsum(np.round(covar_matrix.explained_variance_ratio_, decimals=4) * 100)
        explained_ratio = cumulative_sum_of_variance_explained[-1]
In [37]:
X_pca = []
for i in range(len(stock_type)):
    pca = PCA(n_components = n_com)
    principalComponents = pca.fit_transform(X[i])
    print(col_name[i], 'explained variance ratio:', np.sum(pca.explained_variance_ratio_))
    X_pca.append(pd.DataFrame(data = principalComponents))
0056 PTD explained variance ratio: 0.9501674397417276
0050 Yuanta Taiwan Top50 explained variance ratio: 0.9718934872626461
3008 Largan explained variance ratio: 0.9958298600286208
2454 MediaTek explained variance ratio: 0.9969411144075198
2881 Fubon FHC explained variance ratio: 0.978613483656168
2317 Hon Hai Precision explained variance ratio: 0.9826861176075244
2884 E.S.F.H explained variance ratio: 0.9766266710817479
2882 Cathay Holdings explained variance ratio: 0.9762145734322231
2330 TSMC explained variance ratio: 0.9989762230571851
2885 Yuanta Group explained variance ratio: 0.9097365021256499
In [38]:
train_x_pca = []
test_x_pca = []
n_test = 30
for i in range(len(stock_type)):
    train_x_pca.append(X_pca[i].iloc[:-n_test])
    test_x_pca.append(X_pca[i].iloc[-n_test:])
In [39]:
train_x_pca_roll_window = train_x_pca[0]
test_x_pca_roll_window = test_x_pca[0]
for i in range(len(train_x_pca) - 1):
    train_x_pca_roll_window = pd.concat([train_x_pca_roll_window, train_x_pca[i + 1]], axis=0)
    test_x_pca_roll_window = pd.concat([test_x_pca_roll_window, test_x_pca[i + 1]], axis=0)
train_x_pca = train_x_pca_roll_window
test_x_pca = test_x_pca_roll_window

Shuffle the data.

import module

In [40]:
from sklearn.utils import shuffle
In [41]:
train_x, train_x_pca, train_y = shuffle(train_x, train_x_pca, train_y, random_state=0)

Building Models

model - linear regression

import module

In [42]:
from sklearn import linear_model
import statsmodels.api as sm
In [43]:
#Fit the linear regression.
regr = linear_model.LinearRegression()
regr = regr.fit(train_x, train_y)
#Print the coefficient.
print('intercept:', regr.intercept_)
print('coef:', regr.coef_)
intercept: 0.3015263331562417
coef: [ 1.05877615e-05  2.37582226e-04  6.01353078e-04 -1.23840980e-01
  6.64836884e-02  1.31909188e-03  1.99021852e-04  7.01158294e-01
 -6.67184423e-01  1.32202845e-04 -5.64251639e-01  3.43516174e-02
 -7.00612358e-02  3.40348617e-01  4.23533113e-04  3.94873587e-02
 -3.53634461e-02 -1.94933971e-01  1.26783348e-01  3.11265038e-02
  9.03702188e-02 -1.70259315e-05 -2.51085049e-06  2.94890526e-05
  1.04502194e-03 -3.24258678e-03  9.19213225e-03  3.95934977e-01
  1.60577814e-02  2.35186535e-03 -5.23047640e-01 -7.38501663e-01
  6.01964210e-04  4.20434504e-01 -1.19936999e+00  9.44767872e-02
  6.74221575e-01 -3.69582760e-04 -7.22841628e-02 -7.21724270e-03
  1.47067445e-01 -1.21221153e-01  1.21201949e-03 -1.00466034e-01
 -1.70259309e-05 -2.51090828e-06  1.12096404e-03 -2.72565694e-03
 -3.83556812e-03  6.66624629e-01  3.99257155e-01 -8.31597157e-03
  6.68013251e-03  5.74970811e-01  1.30482436e+00  4.30008603e-03
 -7.47520975e-01  2.99506634e+00 -9.08799910e-04 -1.40177874e+00
 -2.93536164e-04  3.42770429e-02  1.42266671e-01  7.44726877e-02
 -5.78900523e-02  1.81429731e-03  5.58260063e-03 -1.70259299e-05
 -2.51090960e-06 -1.07325377e-03  1.08287506e-03 -1.79177705e-03
  4.93966159e-01  2.36785221e-01  1.36141499e-02  2.76883244e-03
 -3.12260256e-01 -1.94449519e+00 -1.78380509e-02  4.95238462e-01
 -1.68765285e+00  4.14377336e-02 -1.05652519e-01  2.61676480e-04
  1.27868652e-02  1.82789513e-01  3.09067642e-02 -3.16325719e-01
  3.59214823e-03 -7.59816593e-02 -1.70259301e-05 -2.51090394e-06
  1.00531585e-03 -1.97611057e-03 -2.28453882e-03  8.98199265e-01
  4.09473677e-01 -2.07345538e-02  2.93856469e-03  4.66188050e-01
 -3.18741899e-01  1.45566064e-02 -4.36451872e-01  2.80515317e+00
 -5.62462884e-02  6.23638043e-01  4.28501046e-04  4.48399132e-02
 -3.48744600e-01  3.72447343e-01  7.13292099e-02 -6.42086227e-03
 -5.86523446e-03 -1.70259401e-05 -2.51092650e-06 -5.92506652e-04
 -9.63414512e-04 -2.26382636e-05  1.72915873e+00  3.44608307e-01
  4.06758270e-02  3.30318038e-03 -1.07121578e+00  1.23333823e+00
  5.60817146e-03  1.75313159e+00 -9.38272774e-01 -4.02237107e-02
 -1.22044939e-01 -8.25873084e-04 -4.19294991e-02  1.45713636e-01
 -3.78431039e-01 -6.57013277e-02 -2.64362092e-04  1.43790399e-03
 -1.70259272e-05 -2.51090545e-06  1.68259631e-04  3.88963198e-04
 -3.49738468e-03  1.07579863e+00  2.44079401e-01 -2.50298519e-02
  3.48340356e-03 -6.02199828e-01  8.19977077e-01 -1.75298686e-02
  9.39115694e-01 -3.82210046e+00  1.07492643e-01 -4.86078338e-01
 -1.31702970e-04 -4.96869070e-03  2.69269835e-02  3.45537702e-01
  2.36417365e-01 -8.86091333e-04 -3.70987665e-02 -1.70259294e-05
 -2.51090428e-06  2.03674645e-04  2.11891703e-03  1.39446289e-03
  1.95209396e+00  1.90932227e-01  2.55634755e-03 -4.08774626e-03
  1.03374598e+00 -1.46923392e+00  2.16731296e-02 -1.05882149e-01
  3.35186171e+00 -9.98992035e-02  6.19557987e-01  4.91487125e-04
 -6.04201555e-02 -9.72680499e-03 -1.70706365e-01  1.08789949e-01
  2.16604767e-03 -2.91909471e-02 -1.70259286e-05 -2.51090972e-06
  7.94807671e-04 -4.25597962e-04 -6.05238021e-04  2.63616036e+00
  4.35535302e-01 -9.74378400e-03  4.43801235e-04 -2.72125681e+00
 -8.69868398e-01 -2.30811381e-02  2.12241446e+00  6.69848509e-01
  4.80347982e-02  1.19868787e-01  1.87722706e-04  9.26511425e-02
 -1.69281381e-01 -1.77999407e-01 -1.46127836e-02  1.36281858e-03
 -1.07213425e-03 -1.70259273e-05 -2.51090440e-06 -4.82803892e-04
  1.41872500e-03  1.40314033e-03  1.34255323e+00  1.86371682e-01
  1.00128919e-03 -2.54748898e-03 -1.56577766e+00 -2.91488327e-01
  1.87799357e-02  7.04679777e-01  4.18438768e+00 -6.69169315e-02
 -3.11471132e-01 -5.93642246e-04 -2.19757411e-02 -9.56690511e-02
 -2.29399659e-02 -4.42291031e-02 -3.50901907e-03  3.39017617e-03
 -1.70259303e-05 -2.51090839e-06 -5.17684522e-05 -4.64479933e-05
 -1.72694592e-03  9.28324301e-01  2.81838594e-01  1.80838947e-03
  1.21163228e-03  1.42783104e+00  2.81084448e-01  1.33282356e+00
 -1.05990525e+00  7.25757326e-02 -2.97485216e-03 -3.61795281e-01
  4.49647832e-05 -6.59999585e-02  3.24055323e-01  7.91079706e-02
  1.05708791e-01  3.60079615e-03 -3.67191208e-02 -1.70259278e-05
 -2.51090582e-06 -4.78536248e-04 -4.00518791e-03 -7.27536626e-04
  6.09677399e-01  4.21696655e-02 -6.81604359e-03  5.68698305e-03
 -2.26236991e+00 -7.14560472e+00  1.46822620e+00  1.96314390e+00
 -6.82534552e+00  2.41184272e+00  6.67400597e-01  6.48358458e-04
  8.46575446e-02 -4.77051055e-01 -3.14664257e-01 -1.41596063e-01
 -2.86195037e-03  8.75393495e-02 -1.70259280e-05 -2.51090431e-06
 -5.93931035e-04  3.05194080e-03  2.35662774e-03 -5.86668872e-01
 -1.42335098e-01  3.59919021e-03 -4.40570853e-03 -3.43314982e+00
 -1.39187273e-01 -2.61192561e+00  3.38250409e+00 -3.37829361e+00
 -6.06165052e+00 -5.42517252e-01 -3.27242817e-04 -9.98686896e-02
  8.64009084e-01  4.02043820e-01 -5.63443985e-02  4.19998796e-03
 -1.10680286e-02 -1.70259288e-05 -2.51090513e-06  7.97924529e-04
 -3.08293556e-03 -1.03648885e-03  1.76255478e-02  5.07763696e-01
 -8.74804233e-03  3.59027924e-03  2.27301393e+00 -1.82844593e+00
  3.90416621e+00 -1.09838136e+00 -2.24249643e+00  3.40907818e+00
  1.17783072e-01  3.91833743e-04  4.39216907e-02 -1.03039736e+00
 -2.95059205e-01 -6.37891118e-03  7.22586083e-03  4.08981889e-02
 -1.70259284e-05 -2.51090598e-06 -1.47476597e-05  3.59763035e-04
 -1.41459573e-03  1.00168496e+00  2.34752601e-01  1.18633637e-02
  1.04085071e-03 -1.40292222e+00 -2.92386803e-01  6.23630280e-01
  1.18571617e+00  9.09002669e-01 -5.51444067e+00  1.13906467e+00
 -1.52286672e-03  5.01909749e-02  1.65646389e-01  2.63472440e-01
  1.45061206e-01  3.26648627e-03 -2.24885680e-02 -1.70259284e-05
 -2.51090515e-06  5.61510950e-04 -1.33979902e-03 -2.06914768e-03
  4.93311114e-02  1.71785168e-01 -1.49405759e-02  2.93124577e-03
  2.71513082e+00  8.88784895e-01 -5.27688991e+00 -1.31153637e+00
  6.14133369e+00  3.98697094e+00 -4.91472973e-01  1.16097705e-03
 -5.16395693e-02  3.06144290e-01 -1.42096674e-01 -1.46633135e-01
 -3.50873367e-03  1.68707886e-02 -1.70259288e-05 -2.51090588e-06
  8.90921905e-04 -1.17130429e-03 -3.44022157e-04  1.59491037e+00
  3.68942166e-02  3.32602886e-02  1.38855180e-03 -2.39690028e+00
 -5.08836024e+00  4.67520791e-01  3.23080098e+00 -4.29234942e+00
  6.03416022e+00 -4.21104437e-01  1.19497609e-03  1.37986821e-03
 -5.93084253e-02  5.11716404e-02 -8.90336949e-02  2.40634603e-03
 -2.83390870e-02 -1.70259285e-05 -2.51090554e-06 -1.26876678e-03
  1.14999346e-03  3.10138685e-03  1.00950696e+00  2.95525442e-01
 -3.16237048e-03 -1.82417840e-03 -2.78442584e+00  6.47758919e-01
  6.30140550e-01  2.04704007e+00 -1.69645789e+00 -3.87555150e+00
  2.12952264e-01 -1.73437093e-03  1.04927469e-02  2.41964615e-01
  3.74494323e-01  4.98320418e-02 -6.73325526e-03 -8.77094068e-03
 -1.70259289e-05 -2.51090562e-06  2.85554762e-04 -9.29987806e-04
 -1.47990501e-04  6.23525534e-01  1.65534477e-01 -9.53945638e-03
  1.56579215e-03  8.78990571e-01  2.19474900e+00  3.03311981e+00
 -7.73079974e-01  4.84375626e+00 -3.08397998e+00 -8.89961756e-01
 -4.44061594e-04  7.38572540e-02 -1.60667636e-01 -3.36474410e-01
 -3.83471853e-02 -1.90001061e-03  1.53751860e-02 -1.70259285e-05
 -2.51090586e-06  2.05891546e-04 -4.39661090e-04  3.63181246e-05
  7.49825625e-01  4.23857445e-01  4.61896003e-03  1.40299697e-03
  2.92544103e-01  1.82468203e+00  1.11480836e+00 -1.47703178e-01
 -2.09482382e+00 -8.01169628e+00  2.40130645e-01  1.19452102e-03
 -8.88837075e-02  1.53300797e-01  2.26595867e-01  1.58604706e-01
 -1.59085840e-03  3.74863975e-02 -1.70259287e-05 -2.51090559e-06
 -8.25176026e-04  7.15380289e-04 -1.91573945e-03  1.18642119e+00
  1.46491333e-01  4.94257643e-03  2.34685066e-03 -4.12438100e+00
  1.66682626e+00  2.58135236e+00  3.19451997e+00  2.96737645e+00
 -1.60894484e+01  4.93223067e-01  2.79436693e-04  1.23649625e-01
 -7.73634073e-01  4.90365907e-02  1.42002304e-01 -3.29148240e-03
 -4.70069331e-02 -1.70259282e-05 -2.51090540e-06 -3.82970853e-04
 -1.40366143e-03 -1.62737620e-03  6.02128164e-01  2.53366565e-01
 -1.56799610e-02  4.22142754e-03 -6.74348172e-01  5.79053691e+00
 -1.87663816e-02  1.54816027e+00 -6.28754958e+00  9.34046124e+00
  3.74775501e-01 -8.78634407e-05 -4.80649536e-02  3.71904540e-01
 -4.90702639e-01 -4.53268617e-02  6.88986188e-04 -1.77213280e-02
 -1.70259286e-05 -2.51090572e-06  1.31936520e-03 -1.32372564e-03
 -1.14093002e-03  1.21806726e+00  3.26424467e-01  7.63122132e-03
 -2.70095477e-04  1.05699820e+00 -8.84298923e+00  4.27148239e+00
 -2.58651385e-02 -1.02562046e+01  1.33207352e+01 -5.15688787e-01
 -4.01656468e-04 -1.74056591e-02 -9.33302637e-02  2.51722294e-01
  1.05925323e-01  1.00676081e-02 -1.11163068e-02 -1.70259283e-05
 -2.51090555e-06  8.37548430e-04 -4.76914938e-04  9.20592534e-04
  1.00314981e+00 -1.61510427e-01 -6.52510202e-03 -1.23816544e-03
 -2.39392688e+00 -6.96426882e+00  1.67531947e+00  2.86370773e+00
  1.51876988e+00 -8.76305181e+00 -3.29447936e-01  6.27913985e-04
 -7.51670239e-02  3.44541772e-01 -2.60570951e-01 -2.12398116e-01
 -4.38714937e-03  9.25527385e-03 -1.70259284e-05 -2.51090568e-06
  1.37602732e-03  6.07897932e-04  1.15007657e-03  4.01291886e-02
 -6.46022740e-02  4.53948651e-03 -4.49574594e-03 -7.05831058e-01
  6.27029570e+00 -8.73042938e+00  1.67788085e+00  1.49559278e+00
  3.29359106e-01  1.10069617e+00 -4.21313731e-04  9.93756053e-02
 -1.34602788e-01 -1.19079432e-01 -5.74729153e-02  1.08713174e-02
  1.70463926e-02 -1.70259284e-05 -2.51090586e-06  6.99041941e-04
 -9.73363864e-05  4.04061712e-04  5.55471003e-01  1.66363576e-01
  1.55899211e-03 -3.22912734e-04  2.99008047e-01 -3.67015025e+00
  3.87793260e+00  1.11851410e-01  4.39586580e-01 -7.76449589e-01
 -8.25413969e-01  8.64282245e-05  1.95589720e-04  2.81462577e-01
  3.52646619e-01 -1.40282494e-01 -1.12031769e-03  1.26344618e-02
 -1.70259284e-05 -2.51090582e-06  9.50156387e-04 -4.00401805e-04
  7.06956468e-05  4.47010587e-01  1.89271794e-01  2.10712435e-02
 -9.24851698e-04 -9.69575081e-01 -2.31904102e+00  8.61471234e-01
  3.67305733e-01 -4.85035149e-01 -2.00309224e+00 -4.82928523e-01
 -7.21838586e-04 -1.11643050e-01 -1.54239788e-01  3.15967875e-02
  1.22632530e-01 -4.60913277e-03  8.33381715e-03 -1.70259283e-05
 -2.51090569e-06  9.59737583e-04 -2.38436623e-03 -3.91853622e-04
  6.11048190e-02  1.15730959e-01 -1.72728809e-02  1.82388161e-03
 -1.08276262e+00 -2.90266626e+00 -1.43912447e-01  6.31409784e-01
  1.85416731e+00 -8.88480829e+00  4.36524583e-01  3.97074996e-04
  6.56262308e-02  1.59337146e-01 -2.80390119e-01  7.51188254e-02
 -5.16458964e-03  3.56441768e-02 -1.70259285e-05 -2.51090585e-06
  1.02826625e-03 -8.65235573e-04  1.12111591e-03 -4.49331888e-01
 -1.31193138e-02 -1.39380506e-02 -1.30871635e-03 -7.73345920e-01
  8.98434496e+00  1.71126392e-02  9.36918666e-01 -6.65032455e+00
  9.58295449e+00 -1.97886344e-01  4.70361112e-04 -5.50952248e-02
 -3.63170677e-01 -2.13193666e-02  1.48623443e-01 -6.52825260e-03
  3.50731733e-02 -1.70259285e-05 -2.51090585e-06 -7.30225430e-05
  5.57783951e-04 -1.22303969e-03  3.00375736e-01 -1.45648094e-02
  2.13743788e-02  3.40072038e-04 -2.29333729e+00  6.84470043e-02
  3.15580377e+00  1.36124726e+00 -6.90764037e+00  8.63816328e+00
  3.30061308e-01 -4.66845360e-04  4.02102191e-02  2.28596397e-01
  1.30865180e-01 -7.55549793e-02 -6.88063148e-03 -3.63495999e-02
 -1.70259287e-05 -2.51091975e-06 -2.61902708e-03]
In [44]:
X_OLS = sm.add_constant(train_x)
train_y = list(train_y)
model = (sm.OLS(train_y, X_OLS).fit())
D:\programming\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py:2580: FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
  return ptp(axis=axis, out=out, **kwargs)

model - SVM

import module

In [45]:
from sklearn.svm import LinearSVR
In [46]:
svm = LinearSVR()
svm = svm.fit(train_x_pca, train_y)
D:\programming\Anaconda3\lib\site-packages\sklearn\svm\base.py:929: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

model - DNN

import module

In [47]:
import keras
In [48]:
model = keras.models.Sequential()
model.add(keras.layers.Dense(10, activation="selu"))
model.add(keras.layers.Dense(10, activation='relu'))
model.add(keras.layers.Dense(1))
model.compile(loss="mse", optimizer="adam", metrics=['mae'])
model.fit(np.array(train_x_pca), np.array(train_y), batch_size=256, epochs=20)
Epoch 1/20
53/53 [==============================] - 0s 3ms/step - loss: 239059.6406 - mae: 172.1244
Epoch 2/20
53/53 [==============================] - 0s 3ms/step - loss: 87664.9609 - mae: 96.8853
Epoch 3/20
53/53 [==============================] - 0s 2ms/step - loss: 44117.7266 - mae: 68.4562
Epoch 4/20
53/53 [==============================] - 0s 2ms/step - loss: 22283.3008 - mae: 47.6736
Epoch 5/20
53/53 [==============================] - 0s 2ms/step - loss: 8308.7275 - mae: 30.0375
Epoch 6/20
53/53 [==============================] - 0s 2ms/step - loss: 4881.0269 - mae: 23.1260
Epoch 7/20
53/53 [==============================] - 0s 2ms/step - loss: 3071.0894 - mae: 18.3771
Epoch 8/20
53/53 [==============================] - 0s 3ms/step - loss: 1952.2456 - mae: 14.7146
Epoch 9/20
53/53 [==============================] - 0s 2ms/step - loss: 1268.8518 - mae: 12.0557
Epoch 10/20
53/53 [==============================] - 0s 2ms/step - loss: 861.1654 - mae: 10.3314
Epoch 11/20
53/53 [==============================] - 0s 2ms/step - loss: 625.9505 - mae: 9.2015
Epoch 12/20
53/53 [==============================] - 0s 2ms/step - loss: 473.8837 - mae: 8.3779
Epoch 13/20
53/53 [==============================] - 0s 2ms/step - loss: 387.2669 - mae: 7.8045
Epoch 14/20
53/53 [==============================] - 0s 2ms/step - loss: 337.1528 - mae: 7.4030
Epoch 15/20
53/53 [==============================] - 0s 2ms/step - loss: 300.9296 - mae: 7.0566
Epoch 16/20
53/53 [==============================] - 0s 2ms/step - loss: 270.5199 - mae: 6.7712
Epoch 17/20
53/53 [==============================] - 0s 2ms/step - loss: 245.1004 - mae: 6.4837
Epoch 18/20
53/53 [==============================] - 0s 3ms/step - loss: 225.2959 - mae: 6.2643
Epoch 19/20
53/53 [==============================] - 0s 2ms/step - loss: 209.5723 - mae: 6.0763
Epoch 20/20
53/53 [==============================] - 0s 2ms/step - loss: 195.7066 - mae: 5.8994
Out[48]:
<tensorflow.python.keras.callbacks.History at 0x22311edb6c8>

Test

long-buy and short-sell

In [49]:
def long(test_day, n_test, stock_type1, stock_type2, stock_type3, money):
    buy = 0.5 * money
    transection_fee = 0.1 / 100
    ROI_1 = test_y[test_day + stock_type1 * n_test] / 100
    ROI_2 = test_y[test_day + stock_type2 * n_test] / 100
    ROI_3 = test_y[test_day + stock_type3 * n_test] / 100
    if ROI_3 > transection_fee:
        money += 0.5 * buy * ROI_1
        money += 0.5 * 0.5 * buy * ROI_2
        money += 0.5 * 0.5 * buy * ROI_3
        money -= transection_fee * buy
        # print('ROI:', ROI_1, ROI_2, ROI_3)
    elif ROI_2 > transection_fee:
        money += 0.5 * buy * ROI_1
        money += 0.5 * buy * ROI_2
        money -= transection_fee * buy
        # print('ROI:', ROI_1, ROI_2)
    elif ROI_1 > transection_fee:
        money += buy * ROI_1
        money -= transection_fee * buy
        # print('ROI:', ROI_1)
    print('money since long:', money)
    return money

def short(test_day, n_test, stock_type1, stock_type2, stock_type3, money):
    sell = 0.5 * money
    transection_fee = 0.1 / 100
    ROI_1 = test_y[test_day + stock_type1 * n_test] / 100
    ROI_2 = test_y[test_day + stock_type2 * n_test] / 100
    ROI_3 = test_y[test_day + stock_type3 * n_test] / 100
    if ROI_3 < -transection_fee:
        money -= 0.5 * sell * ROI_1
        money -= 0.5 * 0.5 * sell * ROI_2
        money -= 0.5 * 0.5 * sell * ROI_3
        money -= transection_fee * sell
        # print('ROI:', ROI_1, ROI_2, ROI_3)
    elif ROI_2 < -transection_fee:
        money -= 0.5 * sell * ROI_1
        money -= 0.5 * sell * ROI_2
        money -= transection_fee * sell
        # print('ROI:', ROI_1, ROI_2)
    elif ROI_1 < -transection_fee:
        money -= sell * ROI_1
        money -= transection_fee * sell
        # print('ROI:', ROI_1)
    print('money since short:', money)
    return money
In [50]:
money_std = 10000
In [51]:
def money_gain(n_test, pred_y, money_begin):
    money = money_begin
    print('day 0')
    print('money initial:', money)
    for test_day in range(n_test):
        pred_i = pred_y[test_day : : n_test]
        sort_idx = np.argsort(pred_i)
        print('day', test_day + 1)
        money = long(test_day, n_test, sort_idx[-1], sort_idx[-2], sort_idx[-3], money)
        money = short(test_day, n_test, sort_idx[0], sort_idx[1], sort_idx[2], money)
    return money

linear regression

In [52]:
pred_y_regr = regr.predict(test_x)
print('predict rate of return:')
print(pred_y_regr)
predict rate of return:
[-2.53570846e-01 -4.09975935e+00 -7.59036957e+00 -5.70604480e+00
  1.19007336e+01  4.24061340e+01  5.87381006e+01  6.03566597e+01
  7.18757362e+01  6.05789736e+01  3.49186173e+01  5.79904979e+01
  1.00199034e+02  1.54021288e+02  1.69385908e+02  1.76729155e+02
  1.49018708e+02  1.38052003e+02  1.45993816e+02  1.68536605e+02
  2.18145398e+02  2.50701235e+02  2.59312573e+02  2.69433694e+02
  3.00450759e+02  3.08362320e+02  3.22279806e+02  3.33100821e+02
  3.24834955e+02  3.20409108e+02 -1.82717068e-01 -1.37099418e+00
  8.97323033e-01 -2.96238361e+00 -5.91157492e+00 -1.32300589e+00
  3.83679506e+00  9.75524689e-02 -2.99331378e+00 -7.76614092e+00
 -2.75883389e+01 -3.58638443e+01 -2.55478946e+01 -7.99347389e+00
  2.45015867e+00  1.70505173e+00 -9.12309346e+00 -5.47197456e+00
 -2.32817909e+00 -4.07646709e+00  3.81310277e+00  2.17010986e+00
 -7.77758662e+00 -1.12744183e+01 -9.60068336e+00 -8.32196484e+00
 -6.55949392e+00 -5.89662097e+00 -8.67528405e+00 -1.00324369e+01
  8.35607874e-01 -6.18075775e+01  1.73148524e+02  3.00898867e+02
  2.87218972e+02  1.92250804e+02  3.11579619e+02  2.07562150e+02
 -2.09640950e+02 -4.20474873e+02 -5.45186110e+02 -4.95414418e+02
 -4.06770391e+02 -2.99777355e+02 -3.31638756e+02 -3.09447767e+02
 -3.29332028e+02 -2.69765726e+02 -2.98970291e+02 -5.86748601e+02
 -7.91719925e+02 -1.03554376e+03 -1.20638984e+03 -1.32599885e+03
 -1.48575011e+03 -1.55966406e+03 -1.61354108e+03 -1.69184788e+03
 -1.66936869e+03 -1.67064601e+03  2.64867411e-01 -1.64203465e+01
 -2.38977202e+00 -8.16671349e+00 -1.01507461e+01 -4.25073864e+01
 -2.63648903e+01 -4.05890126e+01 -7.00831616e+01 -1.00770408e+02
 -1.51249607e+02 -1.67781754e+02 -2.24966217e+02 -2.44643041e+02
 -2.65945968e+02 -2.71415195e+02 -2.74126485e+02 -2.42120074e+02
 -2.55914676e+02 -2.84780241e+02 -3.20780301e+02 -3.91683072e+02
 -4.20809993e+02 -4.56446350e+02 -5.01853776e+02 -5.15323308e+02
 -5.30985272e+02 -5.46424961e+02 -5.41552948e+02 -5.42189007e+02
 -1.04921640e-01 -1.86337928e+00  2.17369521e+00  9.23545349e+00
  1.55788614e+01  1.35391739e+01  1.06720111e+01  5.75276490e+00
 -3.78850505e+00 -4.49397423e+00  4.46674959e+00  1.16110638e+01
  1.69975758e+01  2.11005226e+01  1.82175813e+01  2.22635321e+01
  2.60913429e+01  3.30077862e+01  3.72601703e+01  4.29350540e+01
  4.85178640e+01  4.54989756e+01  4.78240644e+01  5.77879029e+01
  5.65276304e+01  5.21224128e+01  5.51355238e+01  5.01602270e+01
  4.14899024e+01  4.05485547e+01 -7.30978839e-01 -1.69091125e+00
  6.23276006e+00  1.44272610e+01  3.74870467e+01  5.65523787e+01
  5.73905248e+01  4.58304405e+01  4.64691075e+01  4.16097681e+01
  2.67930953e+01  4.14199472e+01  6.42651523e+01  7.08659792e+01
  6.66616339e+01  7.44890185e+01  6.37484569e+01  5.48239343e+01
  4.84361949e+01  6.36977547e+01  9.41392941e+01  1.02736673e+02
  1.08203633e+02  1.17249842e+02  1.21049125e+02  1.25033330e+02
  1.31710954e+02  1.43599694e+02  1.32972275e+02  1.31059269e+02
 -6.34955164e-02  1.15395847e+01  3.83299288e+01  6.78407286e+01
  1.00873833e+02  1.05141783e+02  1.04245812e+02  9.06181179e+01
  9.88299436e+01  1.03128498e+02  1.24724920e+02  1.58511432e+02
  1.50455185e+02  1.64230951e+02  1.52044889e+02  1.56091132e+02
  1.56601221e+02  1.58923994e+02  1.67454522e+02  2.11880509e+02
  2.56637749e+02  2.85924916e+02  3.00327770e+02  3.09003065e+02
  3.26655664e+02  3.15598615e+02  3.19159433e+02  3.20981379e+02
  3.03164037e+02  3.00893662e+02 -2.00064496e-02  6.25485158e+00
  1.78506547e+01  2.23866201e+01  2.33071798e+01  1.73837893e+01
  1.27260313e+01  4.32910350e+00  5.20587788e+00  1.09784984e+01
  1.47155117e+01  1.01420432e+01  2.10905588e+00  1.90382061e+00
  2.57305350e+00  8.07339406e+00  7.43529313e+00  1.18299396e+01
  1.65176726e+01  2.75454705e+01  3.55549236e+01  3.57369165e+01
  3.27897178e+01  3.32317853e+01  3.10906954e+01  2.40604781e+01
  2.66939141e+01  2.46003076e+01  1.86644262e+01  1.83773222e+01
 -7.05981040e-02  1.48098532e+01  5.13631531e+01  2.23420317e+01
  1.86687932e+01  2.19810783e+01  2.93615079e+01 -1.51550474e+01
 -9.79774778e+00 -1.32021641e+01 -1.21964762e+02 -1.76154225e+02
 -1.66684904e+02 -1.19474857e+02 -9.21876083e+01 -9.86839490e+01
 -1.27737777e+02 -1.08033682e+02 -1.28951078e+02 -1.32668878e+02
 -1.17874174e+02 -1.58225599e+02 -2.01020185e+02 -2.12508521e+02
 -2.23760148e+02 -2.33867580e+02 -2.28605719e+02 -2.29256399e+02
 -2.48420837e+02 -2.51484028e+02  6.17650951e-01  5.68022726e+00
  2.89281884e+01  4.69629947e+01  6.03122551e+01  7.47518525e+01
  6.88519990e+01  6.19047288e+01  5.51885286e+01  7.24579920e+01
  7.59323587e+01  8.70111971e+01  8.50284722e+01  7.51739375e+01
  7.20622559e+01  6.91558582e+01  5.37028057e+01  6.34384908e+01
  7.59629592e+01  7.94553746e+01  1.10714606e+02  1.18305737e+02
  1.22131443e+02  1.26702548e+02  1.36380553e+02  1.32013250e+02
  1.33523835e+02  1.36683017e+02  1.27840495e+02  1.26259195e+02]
In [53]:
money_regr = money_std
pred_y_regr = np.array(pred_y_regr)
money_regr = money_gain(n_test, pred_y_regr, money_regr)
day 0
money initial: 10000
day 1
money since long: 10062.955
money since short: 10070.441838519999
day 2
money since long: 10090.990575091499
money since short: 10333.275258799446
day 3
money since long: 10321.00449442962
money since short: 10352.642243581733
day 4
money since long: 10352.642243581733
money since short: 10620.984024615651
day 5
money since long: 10632.458670231244
money since short: 10632.458670231244
day 6
money since long: 10717.11962239296
money since short: 10795.338519956997
day 7
money since long: 10787.436332160387
money since short: 10787.436332160387
day 8
money since long: 10761.764930548927
money since short: 10856.877409005117
day 9
money since long: 10856.877409005117
money since short: 11002.951266104576
day 10
money since long: 11002.951266104576
money since short: 11017.692470063339
day 11
money since long: 11027.233791742414
money since short: 11120.092749098452
day 12
money since long: 11120.092749098452
money since short: 11120.092749098452
day 13
money since long: 11091.26946869279
money since short: 10945.632782116752
day 14
money since long: 10947.542795037232
money since short: 10947.542795037232
day 15
money since long: 10947.542795037232
money since short: 10993.666161275572
day 16
money since long: 10986.022814876946
money since short: 11072.907149561252
day 17
money since long: 11061.046681890675
money since short: 11086.58663867916
day 18
money since long: 11086.58663867916
money since short: 11141.958595646043
day 19
money since long: 11154.511404748862
money since short: 11156.892892933776
day 20
money since long: 11156.892892933776
money since short: 11169.940879172062
day 21
money since long: 11175.48675481857
money since short: 11175.48675481857
day 22
money since long: 11181.082879811045
money since short: 11181.082879811045
day 23
money since long: 11200.621822143516
money since short: 11340.584792433021
day 24
money since long: 11340.584792433021
money since short: 11445.71910132437
day 25
money since long: 11445.71910132437
money since short: 11433.864197765173
day 26
money since long: 11433.864197765173
money since short: 11425.23163029586
day 27
money since long: 11425.23163029586
money since short: 11487.998996564798
day 28
money since long: 11487.998996564798
money since short: 11452.116231699029
day 29
money since long: 11452.116231699029
money since short: 11557.633167628846
day 30
money since long: 11582.36072379099
money since short: 11708.139370070998
In [54]:
print('final money with linear regression:', money_regr)
final money with linear regression: 11708.139370070998

SVM

In [55]:
pred_y_svm = svm.predict(test_x_pca)
print('predict rate of return:')
print(pred_y_svm)
predict rate of return:
[-4.15765467e-01 -6.46674865e-01 -8.15702046e-01 -9.55672202e-01
 -1.12685273e+00 -1.33007483e+00 -1.55860215e+00 -1.82288279e+00
 -2.07686830e+00 -2.25498228e+00 -2.33243956e+00 -2.31038936e+00
 -2.20098407e+00 -2.03118809e+00 -1.83756617e+00 -1.66432840e+00
 -1.54827729e+00 -1.48987187e+00 -1.48318540e+00 -1.47652886e+00
 -1.44391645e+00 -1.35718341e+00 -1.23736118e+00 -1.10212454e+00
 -9.74339475e-01 -8.72017889e-01 -7.93864016e-01 -7.31242989e-01
 -6.70161764e-01 -6.44727442e-01  3.43593065e-01  3.66633279e-01
  3.90103218e-01  4.31619474e-01  5.04443085e-01  6.05167441e-01
  7.13403387e-01  8.10848547e-01  8.96751705e-01  9.54623495e-01
  1.01215794e+00  1.06046168e+00  1.09357224e+00  1.11392311e+00
  1.11992857e+00  1.13542153e+00  1.14991925e+00  1.16290735e+00
  1.18760331e+00  1.20757961e+00  1.22836039e+00  1.23316996e+00
  1.22690315e+00  1.18033889e+00  1.14622808e+00  1.11120619e+00
  1.06938090e+00  1.04140474e+00  1.01948621e+00  1.00670999e+00
 -6.23297032e+00 -2.26639781e+01 -3.39438678e+01 -3.80622685e+01
 -3.41667532e+01 -2.88312255e+01 -2.63453032e+01 -2.81543526e+01
 -3.14304817e+01 -3.22512557e+01 -2.82487523e+01 -2.05429788e+01
 -1.28027999e+01 -8.41399362e+00 -7.16243275e+00 -7.04126179e+00
 -7.68887482e+00 -7.04927475e+00 -4.83724566e+00 -3.20558421e+00
 -2.36334007e+00 -4.62275279e+00 -8.18993162e+00 -1.08481445e+01
 -1.23046145e+01 -1.34677696e+01 -1.48641882e+01 -1.57553294e+01
 -1.67674105e+01 -1.74180377e+01  4.95807794e+00  7.88632164e+00
  8.73510975e+00  9.01360553e+00  8.81897862e+00  9.62590090e+00
  1.07063077e+01  1.18799434e+01  1.24851894e+01  1.22167136e+01
  1.14336940e+01  1.01773921e+01  9.03037262e+00  8.37971466e+00
  7.80453649e+00  7.07176224e+00  6.39396175e+00  5.89057308e+00
  5.40599159e+00  5.28575002e+00  5.25051581e+00  5.43719119e+00
  5.60197136e+00  5.74526433e+00  5.51884596e+00  5.51687815e+00
  5.83280633e+00  6.09916061e+00  6.58267942e+00  6.81825314e+00
 -3.34721142e-01 -4.00273231e-01 -5.00483326e-01 -6.63134273e-01
 -8.43771652e-01 -1.01904535e+00 -1.16598156e+00 -1.44207019e+00
 -1.61428518e+00 -1.75289207e+00 -1.87962431e+00 -1.97724776e+00
 -2.02530619e+00 -2.01996698e+00 -1.96186595e+00 -1.85302911e+00
 -1.69644823e+00 -1.50449722e+00 -1.31202229e+00 -1.13211741e+00
 -9.68596311e-01 -8.28715824e-01 -7.03883097e-01 -6.22416008e-01
 -5.73566129e-01 -5.69652123e-01 -5.75946585e-01 -5.93950337e-01
 -6.35530460e-01 -7.06323531e-01 -1.32844307e+00 -7.58603924e-01
 -2.68710257e-01  1.13652429e-01  1.31953249e-01  2.24056119e-01
  1.53632174e-01  7.70794079e-02 -9.05996737e-02 -4.14409352e-01
 -8.03206469e-01 -1.31578588e+00 -1.84029841e+00 -2.27969842e+00
 -2.73422771e+00 -3.25870858e+00 -3.68635448e+00 -4.04691383e+00
 -4.29578741e+00 -4.40213843e+00 -4.41210965e+00 -4.24168882e+00
 -4.00828969e+00 -3.72933708e+00 -3.46141984e+00 -3.20563934e+00
 -2.90738775e+00 -2.64578944e+00 -2.35773385e+00 -2.29686795e+00
  1.37380891e+00  1.34661277e+00  1.44045805e+00  1.65930533e+00
  1.97845471e+00  2.26152081e+00  2.58298695e+00  2.91357626e+00
  3.22302655e+00  3.44301069e+00  3.55429536e+00  3.55236454e+00
  3.41354764e+00  3.18114415e+00  2.82823831e+00  2.50682783e+00
  2.23927415e+00  2.05889078e+00  1.97817988e+00  1.88640131e+00
  1.79310776e+00  1.67046906e+00  1.49306164e+00  1.28003612e+00
  1.08074536e+00  9.46915877e-01  8.76018569e-01  8.86734543e-01
  9.04406007e-01  1.03004641e+00 -7.26741126e-01 -7.08657693e-01
 -6.63672831e-01 -6.83781337e-01 -6.83485851e-01 -6.15916066e-01
 -5.14211460e-01 -4.94535837e-01 -4.39256196e-01 -2.97134524e-01
 -1.64913093e-01 -8.04111925e-02 -8.34408053e-03  6.60420714e-02
  1.02225855e-01  1.40774007e-01  1.90512368e-01  2.22411433e-01
  2.32743431e-01  2.15754978e-01  1.74549441e-01  1.24181254e-01
  7.65362831e-02  8.15408024e-04 -6.40012306e-02 -1.44969545e-01
 -2.22582792e-01 -2.79626288e-01 -3.60412472e-01 -4.22768964e-01
  5.34097269e+01  5.35194050e+01  5.17468440e+01  4.89042337e+01
  4.76173871e+01  4.37865615e+01  3.96179536e+01  3.43776049e+01
  2.76478078e+01  2.15786882e+01  2.01774539e+01  2.40071583e+01
  2.87104009e+01  3.60454816e+01  4.37988874e+01  5.31756240e+01
  6.21642672e+01  6.82914158e+01  7.34163607e+01  7.48413954e+01
  7.47055124e+01  7.29931134e+01  6.99896971e+01  6.34656612e+01
  5.90622020e+01  5.57979944e+01  5.32544162e+01  5.08395329e+01
  4.87235268e+01  4.89372335e+01 -3.99808649e-01 -5.04186393e-01
 -5.12088666e-01 -4.11539252e-01 -2.61103390e-01 -6.39397687e-02
  1.63407871e-01  3.51992672e-01  4.91891841e-01  6.47450083e-01
  8.46600992e-01  1.06205640e+00  1.23762515e+00  1.34156303e+00
  1.33195730e+00  1.23180634e+00  1.07665302e+00  9.17089463e-01
  7.88604110e-01  6.91708608e-01  6.17372849e-01  5.56639721e-01
  5.05984777e-01  4.47738285e-01  3.94644112e-01  3.54986778e-01
  3.41652208e-01  3.49659503e-01  3.84082281e-01  4.04152387e-01]
In [56]:
money_svm = money_std
pred_y_svm = np.array(pred_y_svm)
money_svm = money_gain(n_test, pred_y_svm, money_svm)
day 0
money initial: 10000
day 1
money since long: 10057.98625
money since short: 10029.861605948438
day 2
money since long: 9907.364398689588
money since short: 10029.122191887833
day 3
money since long: 9978.284571497154
money since short: 9973.364029917833
day 4
money since long: 9973.364029917833
money since short: 10141.583514339954
day 5
money since long: 10298.4611343274
money since short: 10315.005612139696
day 6
money since long: 10352.371719969671
money since short: 10305.198550134697
day 7
money since long: 10357.505161675545
money since short: 10395.423988072438
day 8
money since long: 10415.585912897304
money since short: 10468.64811533056
day 9
money since long: 10487.09910763383
money since short: 10550.84756133436
day 10
money since long: 10567.739468280057
money since short: 10600.837628294712
day 11
money since long: 10600.837628294712
money since short: 10641.466663609854
day 12
money since long: 10784.863087268664
money since short: 10676.89851911779
day 13
money since long: 10751.967792605708
money since short: 10666.538032509561
day 14
money since long: 10710.99882966357
money since short: 10676.070262480036
day 15
money since long: 10676.070262480036
money since short: 10722.730027562204
day 16
money since long: 10690.023020295632
money since short: 10823.605547957246
day 17
money since long: 10877.101218378026
money since short: 10902.21644509126
day 18
money since long: 10902.21644509126
money since short: 10924.695452623982
day 19
money since long: 10984.715729440699
money since short: 10987.060966248935
day 20
money since long: 11000.509128871623
money since short: 10999.615337504902
day 21
money since long: 11087.166775783773
money since short: 11087.166775783773
day 22
money since long: 11115.86590698289
money since short: 11120.806909378543
day 23
money since long: 11215.34071861258
money since short: 11293.245279079243
day 24
money since long: 11293.245279079243
money since short: 11360.262219876618
day 25
money since long: 11382.982744316372
money since short: 11403.64001225162
day 26
money since long: 11395.583340582964
money since short: 11397.492100792511
day 27
money since long: 11397.492100792511
money since short: 11442.282820062112
day 28
money since long: 11480.065237933957
money since short: 11472.715126165369
day 29
money since long: 11472.715126165369
money since short: 11566.861660580074
day 30
money since long: 11554.99406051632
money since short: 11635.217495529969
In [57]:
print('final money with svm:', money_svm)
final money with svm: 11635.217495529969

DNN

In [58]:
pred_y_dnn = model.predict(test_x_pca).flatten()
print('predict rate of return:')
print(pred_y_dnn)
WARNING:tensorflow:Layer dense is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

predict rate of return:
[ 4.3653727e+00  4.0122428e+00  2.5223835e+00  5.6383747e-01
  7.5103408e-01 -4.2085987e-01 -3.5695547e-01  5.5728465e-01
  1.4599801e+00  2.6985385e+00  2.4696755e+00  1.5938672e+00
  1.3684014e+00  1.3466676e+00  8.8318807e-01  8.9643076e-02
 -8.4486932e-01 -1.3004278e+00 -6.8328398e-01  3.7870055e-01
  1.0826930e+00  6.8865854e-01  4.5050010e-02 -1.2736186e-02
  4.6757740e-01  9.6857893e-01  2.4200413e+00  3.0043600e+00
  3.6445434e+00  3.1296175e+00  4.2386514e-01  1.7510970e-01
 -4.2580768e-02 -2.2292821e-01 -8.4733313e-01 -1.4424068e+00
 -1.9278289e+00 -1.9179384e+00 -1.8379031e+00 -1.7637349e+00
 -1.6836072e+00 -1.5834981e+00 -1.4685251e+00 -1.3173991e+00
 -1.0155774e+00 -7.3706454e-01 -5.2265567e-01 -4.2105597e-01
 -5.2801722e-01 -5.9955734e-01 -7.5152761e-01 -8.6067361e-01
 -9.7160858e-01 -1.0228256e+00 -1.1052693e+00 -1.2227962e+00
 -1.3488628e+00 -1.5058376e+00 -1.6040488e+00 -1.5878564e+00
  3.8435791e+00 -7.6332884e+00 -2.1505311e+00  3.3825195e+00
  1.0238160e+00 -1.1784411e+01 -2.4524708e+01 -2.0864399e+01
 -1.8460804e+01 -1.7131886e+01 -1.4816974e+01 -1.3889056e+01
 -1.1466876e+01 -1.5193179e+01 -9.9127684e+00 -4.0601988e+00
  1.0081300e+00  4.1620607e+00  5.0879641e+00  4.3816042e+00
  2.5932128e+00  8.0451661e-01 -7.3115844e-01 -2.2453492e+00
 -3.7765992e+00 -5.0562925e+00 -5.8453550e+00 -6.3029661e+00
 -6.4195433e+00 -6.3678770e+00  1.5733472e+01  1.8170374e+01
  2.0119003e+01  2.1494566e+01  2.2187634e+01  2.2193834e+01
  2.0462202e+01  1.7148834e+01  1.2729436e+01  8.3416815e+00
  5.0246897e+00  3.1005485e+00  3.0773933e+00  3.4303215e+00
  3.5329235e+00  3.5931537e+00  3.2276266e+00  2.4003942e+00
  1.4633836e+00  5.0012666e-01 -3.6974117e-02 -9.1612980e-02
  7.8804806e-02  3.3610803e-01  1.2657708e+00  2.9070976e+00
  4.3041649e+00  5.5165281e+00  6.1819391e+00  6.1706791e+00
 -1.5561406e-01 -3.2547492e-01 -4.7565287e-01 -6.0153216e-01
 -6.5487880e-01 -6.1595267e-01 -1.6281144e-01 -1.0638444e-01
 -2.5092238e-01 -4.7292632e-01 -8.2328910e-01 -1.1676990e+00
 -1.1796209e+00 -1.1401492e+00 -8.9964408e-01 -7.2499007e-01
 -6.2375754e-01 -5.9256858e-01 -6.3134879e-01 -5.8031768e-01
 -3.9348239e-01 -2.2449033e-01 -7.7693149e-02  1.9669369e-02
  7.9445675e-02  9.0588406e-02  8.8063076e-02  6.8432644e-02
  2.3325756e-02 -4.0646717e-02  2.5892532e+00  2.8746459e+00
  3.0489242e+00  3.1293151e+00  3.0145004e+00  2.9257476e+00
  2.7582672e+00  2.5543182e+00  2.3235672e+00  2.0679824e+00
  1.8015832e+00  1.5158509e+00  1.3166627e+00  1.1871918e+00
  1.0056275e+00  8.0095369e-01  5.7748109e-01  3.4793550e-01
  1.8880160e-01  7.1667507e-02  9.9380761e-03 -6.3563511e-02
 -8.6100742e-02 -8.9076206e-02 -7.5022861e-02 -8.5299656e-02
 -3.1016514e-02  1.8620327e-02  1.1077578e-01  6.4595059e-02
 -6.7706412e-01 -9.5529670e-01 -1.1676627e+00 -1.3721324e+00
 -1.7778541e+00 -2.2283738e+00 -2.4730103e+00 -2.3491042e+00
 -2.0311844e+00 -1.5521604e+00 -1.0445987e+00 -6.3750285e-01
 -3.8656825e-01 -4.4995898e-01 -5.8494204e-01 -8.0796641e-01
 -1.1008817e+00 -1.3693091e+00 -1.7028934e+00 -2.0259678e+00
 -2.3441350e+00 -2.6724799e+00 -2.9578636e+00 -3.0426290e+00
 -2.9260309e+00 -2.6858299e+00 -2.3532231e+00 -2.0712287e+00
 -1.6366526e+00 -1.6565727e+00 -2.1216002e+00 -2.6402290e+00
 -2.9901731e+00 -3.1494853e+00 -3.0216544e+00 -2.7040145e+00
 -2.2805767e+00 -1.8941225e+00 -1.5380207e+00 -1.2031223e+00
 -9.7846121e-01 -8.5989565e-01 -8.2676357e-01 -8.2153696e-01
 -8.7010926e-01 -8.6698979e-01 -8.1050414e-01 -7.3322457e-01
 -6.1186188e-01 -4.9841899e-01 -4.1126150e-01 -3.7868661e-01
 -4.0258640e-01 -5.0356835e-01 -6.0456294e-01 -7.3250288e-01
 -8.4732538e-01 -9.6171802e-01 -1.0828304e+00 -1.1013852e+00
  4.7594601e+01  5.9266567e+01  7.0870377e+01  7.0320793e+01
  5.0199306e+01  2.1959133e+01 -6.0824003e+00 -2.0079868e+01
 -1.3026477e+01  8.0436525e+00  2.7816845e+01  3.9150494e+01
  4.7524761e+01  4.6755337e+01  3.6515728e+01  2.6637966e+01
  2.1193996e+01  2.2013378e+01  2.5883968e+01  3.1679592e+01
  3.5432140e+01  3.4833172e+01  3.1031155e+01  3.2448376e+01
  2.9319271e+01  2.6000362e+01  2.5161297e+01  2.6330044e+01
  2.7613064e+01  2.7515713e+01 -1.2364680e+00 -5.1509589e-01
 -1.8200394e+00 -2.0274041e+00 -2.0697229e+00 -2.1195409e+00
 -2.5360377e+00 -2.9591234e+00 -1.5892469e+00 -1.0510226e+00
 -5.6664485e-01 -1.7196862e-01 -1.8975331e-01 -4.0268117e-01
 -7.0683318e-01 -9.4600755e-01 -9.9097413e-01 -7.9334527e-01
 -8.0032080e-01 -1.0815402e+00 -1.2807113e+00 -1.3884908e+00
 -1.4067594e+00 -1.2560791e+00 -1.0214725e+00 -3.9441168e-01
 -1.1997144e-01 -1.7039506e-01 -7.4140567e-01 -1.1538879e+00]
In [59]:
money_dnn = money_std
pred_y_dnn = np.array(pred_y_dnn)
money_dnn = money_gain(n_test, pred_y_dnn, money_dnn)
day 0
money initial: 10000
day 1
money since long: 10064.49625
money since short: 10064.49625
day 2
money since long: 10064.49625
money since short: 10185.000979725313
day 3
money since long: 10185.000979725313
money since short: 10186.182439838962
day 4
money since long: 10186.182439838962
money since short: 10278.317733280108
day 5
money since long: 10439.016661460508
money since short: 10439.016661460508
day 6
money since long: 10373.219539443322
money since short: 10339.077383981687
day 7
money since long: 10415.927746176822
money since short: 10454.060457655574
day 8
money since long: 10454.060457655574
money since short: 10459.061418827006
day 9
money since long: 10459.061418827006
money since short: 10510.560529870634
day 10
money since long: 10655.995155922454
money since short: 10708.220188181629
day 11
money since long: 10708.220188181629
money since short: 10774.233688586723
day 12
money since long: 10919.41918109885
money since short: 10919.41918109885
day 13
money since long: 10952.432680065505
money since short: 10873.690165312175
day 14
money since long: 10908.011609107713
money since short: 10908.011609107713
day 15
money since long: 10908.011609107713
money since short: 10951.545483439662
day 16
money since long: 10922.69363686354
money since short: 10987.918501916069
day 17
money since long: 11042.226289111788
money since short: 11057.381744693594
day 18
money since long: 11057.381744693594
money since short: 11064.939465116091
day 19
money since long: 11118.557395529177
money since short: 11118.557395529177
day 20
money since long: 11132.166509781304
money since short: 11132.166509781304
day 21
money since long: 11227.67354083116
money since short: 11240.240114441734
day 22
money since long: 11265.731573991274
money since short: 11265.731573991274
day 23
money since long: 11363.220990383254
money since short: 11363.220990383254
day 24
money since long: 11363.220990383254
money since short: 11395.988258511647
day 25
money since long: 11418.780235028671
money since short: 11442.69116084082
day 26
money since long: 11434.606899535685
money since short: 11455.280668810045
day 27
money since long: 11455.280668810045
money since short: 11521.47357624468
day 28
money since long: 11559.51748199344
money since short: 11550.310326319031
day 29
money since long: 11550.310326319031
money since short: 11653.772231067034
day 30
money since long: 11661.720103728621
money since short: 11775.06036141676
In [60]:
print('final money with dnn model:', money_dnn)
final money with dnn model: 11775.06036141676

Result

In [61]:
def get_label(n_test, y):
    label = []
    for test_day in range(n_test):
        sort_idx = y[test_day : : n_test]
        label.append(np.argsort(sort_idx))
    label = np.array(label).flatten()
    return label
In [62]:
def get_mse(test_y, pred_y):
    mse = ((test_y - pred_y) ** 2).mean()
    return mse

final money

In [63]:
print('final money with linear regression:', money_regr)
print('final money with svm:', money_svm)
print('final money with dnn model:', money_dnn)
final money with linear regression: 11708.139370070998
final money with svm: 11635.217495529969
final money with dnn model: 11775.06036141676

Confusion matrix, Accuracy, Sensitivity(Recall), Precision

In [64]:
test_label = get_label(n_test, test_y)
pred_label_regr = get_label(n_test, pred_y_regr)
pred_label_svm = get_label(n_test, pred_y_svm)
pred_label_dnn = get_label(n_test, pred_y_dnn)

import module

In [65]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, recall_score, precision_score
In [66]:
print('Confusion matrix')
print('linear regression:', confusion_matrix(test_label, pred_label_regr), sep='\n')
print('SVM:', confusion_matrix(test_label, pred_label_svm), sep='\n')
print('DNN:', confusion_matrix(test_label, pred_label_dnn), sep='\n')

print('Accuracy')
print('linear regression:', accuracy_score(test_label, pred_label_regr))
print('SVM:', accuracy_score(test_label, pred_label_svm))
print('DNN:', accuracy_score(test_label, pred_label_dnn))

print('Sensitivity(Recall)')
print('linear regression:', recall_score(test_label, pred_label_regr, average=None))
print('SVM:', recall_score(test_label, pred_label_svm, average=None))
print('DNN:', recall_score(test_label, pred_label_dnn, average=None))

print('Precision')
print('linear regression:', precision_score(test_label, pred_label_regr, average=None))
print('SVM:', precision_score(test_label, pred_label_svm, average=None))
print('DNN:', precision_score(test_label, pred_label_dnn, average=None))

print('Mean Square Error')
print('linear regression:', get_mse(test_y, pred_y_regr))
print('SVM:', get_mse(test_y, pred_y_svm))
print('DNN:', get_mse(test_y, pred_y_dnn))
Confusion matrix
linear regression:
[[3 4 0 2 3 6 4 2 4 2]
 [3 4 0 0 5 4 1 2 5 6]
 [4 0 5 9 3 0 5 1 3 0]
 [6 1 9 5 1 1 3 1 0 3]
 [0 3 2 4 2 3 4 3 6 3]
 [1 3 1 2 4 8 1 6 1 3]
 [3 2 3 4 1 2 0 7 4 4]
 [4 7 0 0 3 3 1 5 2 5]
 [2 1 5 2 3 1 8 1 4 3]
 [4 5 5 2 5 2 3 2 1 1]]
SVM:
[[ 4  4  1  3  5  4  4  1  1  3]
 [ 2  3  0  3  3  1  6  5  0  7]
 [ 5  1  6  3  1  5  1  1  5  2]
 [ 0  1 11  4  1  3  2  2  6  0]
 [ 5  0  1  3  1  4  3  4  4  5]
 [ 2  8  2  0  3  2  3  4  2  4]
 [ 5  3  1  2  0  7  3  5  3  1]
 [ 2  6  0  2  8  1  3  5  1  2]
 [ 3  1  4  6  2  2  3  1  5  3]
 [ 2  3  4  4  6  1  2  2  3  3]]
DNN:
[[5 3 1 2 7 5 1 2 1 3]
 [5 1 1 2 3 8 5 4 0 1]
 [1 2 4 4 2 3 4 2 6 2]
 [2 0 8 4 0 1 5 3 5 2]
 [3 7 1 4 2 1 2 4 4 2]
 [4 4 1 0 5 3 3 2 2 6]
 [4 3 2 1 1 2 5 5 3 4]
 [3 4 2 2 4 4 2 5 1 3]
 [2 1 6 7 2 2 0 2 4 4]
 [1 5 4 4 4 1 3 1 4 3]]
Accuracy
linear regression: 0.12333333333333334
SVM: 0.12
DNN: 0.12
Sensitivity(Recall)
linear regression: [0.1        0.13333333 0.16666667 0.16666667 0.06666667 0.26666667
 0.         0.16666667 0.13333333 0.03333333]
SVM: [0.13333333 0.1        0.2        0.13333333 0.03333333 0.06666667
 0.1        0.16666667 0.16666667 0.1       ]
DNN: [0.16666667 0.03333333 0.13333333 0.13333333 0.06666667 0.1
 0.16666667 0.16666667 0.13333333 0.1       ]
Precision
linear regression: [0.1        0.13333333 0.16666667 0.16666667 0.06666667 0.26666667
 0.         0.16666667 0.13333333 0.03333333]
SVM: [0.13333333 0.1        0.2        0.13333333 0.03333333 0.06666667
 0.1        0.16666667 0.16666667 0.1       ]
DNN: [0.16666667 0.03333333 0.13333333 0.13333333 0.06666667 0.1
 0.16666667 0.16666667 0.13333333 0.1       ]
Mean Square Error
linear regression: 97307.0731520697
SVM: 324.26186742551266
DNN: 161.55144644209508
In [69]:
print('\t\t\tlinear regression', 'SVM', '\tDNN', sep='\t\t\t')
print('Accuracy:\t\t', accuracy_score(test_label, pred_label_regr),'\t\t\t', accuracy_score(test_label, pred_label_svm), '\t\t\t\t', accuracy_score(test_label, pred_label_dnn))
print('Sensitivity(Recall):\t', recall_score(test_label, pred_label_regr, average='weighted'),'\t\t\t', recall_score(test_label, pred_label_svm, average='weighted'), '\t\t\t\t', recall_score(test_label, pred_label_dnn, average='weighted'))
print('Precision:\t\t', precision_score(test_label, pred_label_regr, average='weighted'),'\t\t\t', precision_score(test_label, pred_label_svm, average='weighted'), '\t\t\t\t', precision_score(test_label, pred_label_dnn, average='weighted'))
print('Mean Square Error\t', get_mse(test_y, pred_y_regr),'\t\t\t', get_mse(test_y, pred_y_svm), '\t\t', get_mse(test_y, pred_y_dnn))
			linear regression			SVM				DNN
Accuracy:		 0.12333333333333334 			 0.12 				 0.12
Sensitivity(Recall):	 0.12333333333333334 			 0.12 				 0.12
Precision:		 0.12333333333333334 			 0.12 				 0.12
Mean Square Error	 97307.0731520697 			 324.26186742551266 		 161.55144644209508

Conclusion & Application

88

  • DNN > SVM > linear regression

Division of labor

  • 0616098黃秉茂 - 44%
  • 0616031李昀奇 - 28%
  • 0616309王祥任 - 28%